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Abstract. We study the nature of cluster selection in Sunyaev-Zel'dovich (SZ) surveys, focusing on single frequency 
observations and using Monte Carlo simulations incorporating instrumental effects, primary cosmic microwave 
background (CMB) anisotropics and extragalactic point sources. Clusters are extracted from simulated maps with 
an optimal, multi-scale matched filter. We introduce a general definition for the survey selection function that 
provides a useful link between an observational catalog and theoretical predictions. The selection function defined 
over the observed quantities of flux and angular size is independent of cluster physics and cosmology, and thus 
provides a useful characterization of a survey. Selection expressed in terms of cluster mass and redshift, on the 
other hand, depends on both cosmology and cluster physics. We demonstrate that SZ catalogs are not simply flux 
limited, and illustrate how incorrect modeling of the selection function leads to biased estimates of cosmological 
parameters. The fact that SZ catalogs are not flux limited complicates survey "calibration" by requiring more 
detailed information on the relation between cluster observables and cluster mass. 
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1. Introduction 

Galaxy cluster surveys are important tools for measur- 
ing key cosmological quantities and for understanding 
the process of structure formation in the universe 
HBahcall et al. 19991 IRosati et al. 2002|l . Surveying 
for clusters using the Sunyaev-Zel'dovich (SZ) effect 
( ISunyaev fc Zeldovich 1970i |Suny acv fc Zeldo vich 1972) 
for recent reviews, see IBirkinshaw 19991 and 
lUarlstrom, Holder fc Reese 2002| ) offers a number of 
advantages over more traditional methods based on 
X-ray or optical imaging. These advantages include good 
detection efficiency at high-redshift; a selection based on 
the thermal energy of the intracluster medium, a robust 
quantity relative to any thermal structure in the gas; 
and an almost constant mass detection limit with red- 
shift HHolder et al. 20001 IBartlett 20001 IBartlett 200T|l . 
A new generation of optimized, dedicated instru- 
ments, both large bolometer arrays IjMasi et al. 20031 
[Runyan ct al. 2003 Kosowsky 20041 and interferometers 
iLo et al. 2000 , Jones 20 02). will soon perform such SZ 
cluster surveys, and we may look forward to the large 
and essentially full-sky SZ catalog expected from the 
Planck mission^. In anticipation, many authors have 
studied the nature and use of SZ cluster catalogs and 
made predictions for the number of objects expected 
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from various proposed surveys IjHolder et al. 2000l 
IKneissl et al. 200T|l . A good example of the potential 
of an SZ survey is the use of its redshift distribution 
to examine structure formation at high redshift and to 
thereby constrain cosmological parameters, such as the 
density parameter ^Im HBarbo sa et al. 1996|l . and the 
dark energy equation-of-state uj (Ifaima n et al. 20 01). 

An astronomical survey is fundamentally character- 
ized by its selection function, which identifies the subclass 
of objects detected among all those actually present in 
the survey area. It is a function of cluster properties and 
survey conditions. Depending on the nature of the ob- 
servations, relevant cluster properties may include: mass, 
redshift, luminosity, morphology, etc., while key descrip- 
tors of the survey would be sensitivity, angular resolution, 
spectral coverage, etc... The selection function will also de- 
pend on the the detection algorithm used to find clusters 
in the survey data. Understanding of the selection func- 
tion is a prerequisite to any statistical application of the 
survey catalog; otherwise, one has no idea how represen- 
tative the catalog is of the parent population actually out 
in the universe. 

Selection function issues for SZ surveys have been 
touched on recently by several authors IjBartlett 20011 
ISchulz fc White 20031 IWhite 2003|l . while most previous 
studies of the potential use of SZ surveys have not exam- 
ined this point in detail. For example, predictions of the 
redshift distribution of SZ~detected clusters usually as- 
sume that they are point sources, simply selected on their 
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total flux^. We shall see below that this is not necessar- 
ily the case, and an analysis of cosmological parameters 
based on such an assumption would significantly bias the 
results. 

Understanding a survey selection function is difficult. 
By its very nature and purpose, the selection function is 
supposed to tell us about objects that we don't see in the 
survey! Realistic simulations of a survey are central to de- 
termining its selection function fe.g.. lAdami et al. 2001|l . 
One knows which objects are put into the simulation and 
can then compare them to the subset of objects detected 
by the mock observations. In practice, of course, under- 
standing of a selection function comes only from a combi- 
nation of such simulations and diverse observations taken 
under different conditions and/or in different wavebands; 
full understanding thus comes slowly. 

There are really two distinct issues connected to the 
selection function: object detection, or survey complete- 
ness, and object measurement, which we shall refer to as 
photometry; as a separate issue, one must also determine 
the contamination function. One would like to character- 
ize each detected cluster by determining, for example, its 
total flux, angular size, etc... As practitioners are well 
aware, photometry of extended objects faces many difficul- 
ties that introduce additional uncertainty and, in partic- 
ular, potential bias into the survey catalog. The selection 
function must correct for bias induced by both the detec- 
tion and photometric procedures. The two arc, however, 
distinct steps in catalog construction, and the selection 
function (see below) should reflect this fact. 

The object of the present work is to begin a 
study of SZ selection functions for the host of SZ 
surveys that are being planned, and to propose a 
formalism for their characterization. To this end, 
we have developed a rapid Monte Carlo simulation 
tool ( |Delabrouille, Mehn fc Bartlett 2002| ) that produces 
mock images of the SZ sky, including various cluster- 
ing and velocity effects, primary cosmic microwave back- 
ground (CMB) anisotropics, radio point sources and in- 
strumental effects. The main goals of such studies, in this 
period before actual surveying has begun, are to improve 
understanding of the expected scientific return of a given 
survey and to help optimize observing strategies. 

Our specific aim in the present work is to study selec- 
tion effects in SZ surveys by focusing on single frequency 
observations, such as will be performed by up-coming in- 
terferometers. Most bolometer cameras propose surveys 
at several frequencies, although not necessarily simultane- 
ously; the present considerations are therefore applicable 
to the first data sets from these instruments. This work 
builds on that of Bartlett 20QQ, by adding the effects of 
primary CMB anisotropics, point sources and photomet- 



^ The term flux does not really apply in the case of SZ obser- 
vations, as the effect is measured relative to the unperturbed 
background and may be negative. We shall nevertheless use it 
throughout for simplicity. 



ric errors, and by the use of an optimized cluster detection 
algorithm IjMelin et al. 2004|l . 

General considerations concerning the selection func- 
tion are given in the next section and used to motivate 
our definition given in Eq.(^|. We then briefly describe 
(Section 13) our simulations, based on a Monte Carlo ap- 
proach incorporating cluster correlations and velocities, as 
well as our cluster detection and photometry algorithms 
built on an optimized spatial filter (details will be given 
elsewhere |Melin et al. 2004) ). A discussion of cluster se- 
lection with this method follows (SectionQJ, where with a 
simple analytic argument, we show how cluster detection 
depends on both total flux and angular size. Our main con- 
clusion is that SZ surveys will not be simply flux limited. 
Our simulations support the analytical expectations, and 
they also highlight the difficulty of performing accurate 
photometry on detected clusters. 

We close with a discussion (Section O of some im- 
plications for upcoming surveys. The most important is 
that the redshift distribution of observed clusters dif- 
fers from that of a pure flux-limited catalog; assum- 
ing pure flux selection will therefore lead to biased es- 
timates of cosmological parameters. In this same sec- 
tion, we give an explicit example of biased parameter 
estimation caused by the presence of incorrectly mod- 
eled excess primary CMB power on cluster scales, as 
suggested by the CBI experiment (jMason et al. 200T|l . 
We note that non-trivial cluster selection compli- 
cates survey "calibration" IjBartelmann 200T1 IHu 20031 
IMajumdar fc Mohr 2003| ILima fc Hu 2004|l because a 
size-mass relation must be obtained in addition to a flux- 
mass relation. Photometric errors will further increase the 
difficulty by augmenting scatter in the mass-observable 
relations. 

2. Selection Function: general considerations 

To motivate our definition, we first consider some gen- 
eral properties desired of a survey selection function. 
Fundamentally, it relates observed catalog properties (e.g., 
fiux and size) to relevant intrinsic characteristics of the 
source population under study. In particular, we want it 
to tell us about the completeness of the survey catalog as 
a function of source properties, which is a measure of the 
selection bias. In addition, we also wish for it to reflect 
the effects of statistical (e.g., photometric) errors. Notice, 
on the other hand, that the selection function will not tell 
us anything about contamination of the catalog by false 
detections; this is another function of observed quantities 
that must be separately evaluated. 

Consider the example of a flux-limited catalog of point 
sources. Neglecting photometric measurement errors, the 
probability that a source at redshift z will find its way 
into the survey catalog is simply given by the fraction of 
sources brighter than the flux limit, which may be calcu- 
lated as an integral over the luminosity function at z (e.g., 
[Peebles 1998|l . Extended objects complicate the situation, 
for their detection will in general depend on morphology. 



J.-B. Melin et al.: The Selection Function of SZ Cluster Surveys 



3 



One must then define appropriate source descriptors other 
than just a total flux; and even the definition of total flux, 
conceptually simple, becomes problematic (fixed aperture 
flux, isophotal flux, integrated flux with a fitted profile, 
etc.). The choice of descriptors is clearly important and 
the selection function will depend on it. They must en- 
code relevant observational information on the sources and 
represent observables with as little measurement error as 
possible. 

The simplest characterization for extended SZ sources 
would employ a total observed flux, Yq, and a represen- 
tative angular size, which we take to be the core radius 
9co- By total flux, we mean the flux density integrated 
over the entire cluster profile, out to the virial radius, and 
we express it in a frequency independent manner as the 
integrated Compton-y parameter. We limit ourselves to 
these two descriptors in the ensuing discussion, although 
clearly many others describing cluster morphology are of 
course possible (ellipticity, for example...). How the ob- 
served quantities are actually measured is crucial - mea- 
surement errors and the selection function will both de- 
pend on the technique used. 

Our detected clusters will then populate the ob- 
served parameter space according to some distribution 
dNo/dYoddco- What we really seek, however, is the true 
cluster distribution, dN/dVddc, over the intrinsic cluster 
parameters Y and 9c- Measurement errors and catalog in- 
completeness both contribute to the difference between 
these two distributions. In addition, the catalog will suffer 
from contamination by false detections. 

These general considerations motivate us to define the 
selection function as the joint distribution of Y^ and 9co, 
as a function of (i.e., given) Y and 9c- There are many 
other factors that influence the selection function, such 
as instrument characteristics, observation conditions and 
analysis methods, so in general we write 



<S>[Y,,9co\Y,9c,aN, 



(1) 



where 9{^i^^ is the FWHM of an assumed Gaussian beam 
and (T^ is the map noise variance. We illustrate our main 
points throughout this discussion with simple uniform 
Gaussian white noise. The dots represent other possible 
influences on the selection function, such as the detection 
and photometry algorithms employed to construct the cat- 
alog. 

Several useful properties follow from this definition. 
For example, the selection function relates the observed 
counts from a survey to their theoretical value by 



dNo 
dYod9c 



{Yo.9co) ^ dY d9c<^{Yo,9co\Y,9c) 
Jo JO 



(2) 



A similar relation can be established between the observed 
counts and cluster mass and redshift: 



dNo 



{Yo,9co) ^ dz dM'i>{Yo,9co\z,M) 
Jo Jo 



dN , 
dzdM ^ ' 



(3) 



where dN / dzdM is the mass function and '5 incorporates 
the intrinsic and observational scatter in the relation be- 
tween {Yo,9co) and {z,M) (mass-observable relations). 
This is made more explicit by 



/•OC /-OO 

^{Yo,9co\z,M) ^ dY d9cHYo,0co\Y,9c) 

JO JO 

xT{Y,9c\z,M) 



(4) 



where the function T represents the intrinsic scatter in 
the relation between actual flux Y and core radius 0c, and 
cluster mass and redshift. 

In general, we may separate the selection function into 
two parts, one related to detection and the other to pho- 
tometry: 



$(yo, Oco\Y, 9c) = x{Y, Oc)F{Yo, 9co\Y, 9c) 



(5) 



The first factor represents survey completeness and is sim- 
ply the ratio of detected to actual clusters as a function of 
true cluster parameters. The second factor quantifies pho- 
tometric errors with a distribution function F normalized 
to unity: 



JdYod9co F{Yc,9co\Y,9c) - 1 



In the absence of measurement errors we would have 
$(yo, dco\Y, 9c) = x{Yo, Oco)S{Yo - Y)S{9co - 9c) 
in which case the observed counts become 

{Yo,9co) = x{Yo, 9co)-^ {Y, 9c) 



dYd9r 



(6) 



The importance of the selection function for cosmolog- 
ical studies lies in Eq. which relates the cosmologically 
sensitive mass function to the observed catalog distribu- 
tion. Accurate knowledge of ^' is required in order to ob- 
tain constraints on cosmological parameters, such as the 
density parameter or the dark energy equation-of-state. 

3. Simulations 

Detailed study of SZ selection issues requires realistic 
simulations of proposed surveys. Although analytic argu- 
ments do provide significant insight, certain effects, such 
as cluster-cluster blending and confusion, can only be 
fully modeled with simulations. To this end, we have de- 
veloped a rapid Monte Carlo-based simulation tool that 
allows us to generate a large number of realizations of 
a given survey. This is essential in order to obtain good 
measures of the selection function that are not limited by 
insufficient statistics. In this section we briefiy outline our 
simulation method and our cluster detection algorithm, 
leaving details to,Delabrouille, Melin fc Bartlett 2'002| and 
IMelin et al.^OOll 

Unless explicitly stated, the simulations used in this 
work are for a flat concordance model ( |Spergel et al. 20d3| ) 
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with r^M = 0.3 = 1 — flA, Hubble constant of Ho — 



70 km/s/Mpc (Freedman et al. (2001) I and a power spec- 
trum normalization tjg = 0.98. The normalization of 
the M — T relation is chosen to reproduce the lo- 
cal abundance of X-ray clusters with this value of erg 
( |Pierpaoli et al. 2001| ). Finally, we fix the gas mass frac- 
tion at /gas = 0.12 (e.g.,|Mohr et al.(1999)|). 



3.1. Method 

Our simulations produce sky maps at different fre- 
quencies and include galaxy clusters, primary CMB 
anisotropics, point sources and instrumental properties 
(beam smoothing and noise). In this work, we do not 
consider diffuse Galactic foregrounds, such as dust and 
synchrotron emission, as we are interested in more rudi- 
mentary factors influencing the selection function; we 
leave foreground issues to a future work (as general refer- 
ences, see [Bouchet &: Gispert 1999] [Tegmark et al. 2000| 
[Delabrouille, Cardoso fc Patanchon 2003[ |. 

We model the cluster population using the Jenkins et 
al. (2001) mass function and self-similar, isothermal f3- 
profiles for the SZ emission. A realization of the linear 
density field Sp/ p within a comoving 3D box, with the ob- 
server placed at one end, is used to construct the cluster 
spatial distribution and velocity field. We scale the density 
field by the linear growth factor over a set of redshift slices 
(or bins) along the past light-cone of the observer; a set 
of mass bins is defined within each redshift slice. We then 
construct a random cluster catalog by drawing the number 
of clusters in each bin of mass and redshift according to 
a Poisson distribution with mean given by the mass func- 
tion integrated over the bin. Within each redshift slice, 
we spatially distribute these clusters with a probability 
proportional to 1 -I- 6^, where b is the linear bias given 
bv lMo fc White 1 996' Comparison of the resulting spatial 
and velocity 2-point functions of the mock catalog with 
results from the VIRGO consortium's N-body simulations 
shows that this method faithfully reproduces the correla- 
tions down to scales of order of lOh"^ Mpc. 

Individual clusters are assigned a temperature using 
a M — T relation consistent with the chosen value of erg 
dPierpaoh et al. 2001| ) 



M 



IQis/i-iMfT 



(7) 



with Pp = 1.3 ± 0.13 ± 0.13 keV. Here, Ac is the mean 
density contrast for virialization (weakly dependent on the 
cosmology) and E{z) — H{z)/Hq. As mentioned, we dis- 
tribute the cluster gas with an isothermal /3-model: 

ne(r) =ne(0) 1+ (^j (8) 

where we fix = 2/3 and the core radius is taken to be 
Vc = 0.1 Tt,, with the virial radius given by 



1.69 



M 



1015M, 



o 



1/3 



The central electron density is determined by the gas mass 
fraction /gas- For the present work, we ignore any intrinsic 
scatter in these scaling relations. 

In this way we produce a 3 x 3 degree map of 
the SZ sky. Primary CMB anisotropies are added as a 
Gaussian random field by drawing Fourier modes accord- 
ing to a Gaussian distribution with zero mean and vari- 
ance given by the power spectrum as calculated with 
CMBFAST ( jSeljak fc Zaldarriaga 1996| ). We then popu- 
late the maps with radio and infrared point sources, us- 
ing the counts summarized in lBennett et al. 20031 and fit- 
ted by IKnox e t al. 200"! and the counts from SCUBA 
( [Borys et al. 2003) . Finally, the map is smoothed with 
a Gaussian beam and white Gaussian noise is added to 
model instrumental effects. 

3.2. Detection Algorithm 



We have developed ([Mehn et al. 2004|l a rapid detec- 
tion routine incorporating a deblending algorithm that is 
based on matched filtering ( [Haehnelt fc Tegmark 1996| ), 
for single frequency surveys, and matched multi-filtering 
l|Herranz et al. 2002|l . for multi-frequency surveys. Recall 
that in this work we only examine single frequency sur- 
veys. The matched filter, on a scale 9c, is defined to yield 
the best linear estimate of the amplitude of the SZ signal 
from a cluster with (matched) core radius dc. It depends on 
both the beam-smoothed cluster profile Tc and the noise 
power spectrum P{k). In Fourier space it is given by 



F(k) = 



P{k') (27r)2 



'(k) 



P{k) 



(10) 



^) i?^^/^Mpc(9) 



where P = {Pcmb + Psources)\B\^ + Pins, Tc is the Fourier 
transform of the beam-smoothed cluster profile Tc, B is 
that of the instrumental beam (a Gaussian), and Pcmb, 
Psources and Pins represent the power spectra of the pri- 
mary CMB anisotropies, residual point sources and instru- 
mental noise, respectively. We denote the standard devi- 
ation of the noise (including primary CMB and residual 
points sources) passed through the filter at scale 9c by <Tg^ , 
and give its expression for future reference: 

" r |fc(k)|2 (fk 

This is the fluctuation amplitude of the filtered signal in 
the absence of any cluster signal. 

We can summarize the detection algorithm in three 
steps: 

— Filter the observed map with matched filters on dif- 
ferent scales 9c in order to identify clusters of different 
sizes. This produces a set of filtered maps. 

— In each filtered map, find the pixels that satisfy > 
threshold (e.g. 3 or 5). Define cluster candidates as 
local maxima among these pixels. At this point, each 
cluster candidate - in each map - has a position, size 
(that of the filter that produced the map), and a SZ 
flux given by the signal through the matched filter. 
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Identify cluster candidates across the different filtered 
maps using a tree structure (the same cluster can ob- 
viously be detected in several filtered maps) and elim- 
inate multiple detections by keeping only cluster prop- 
erties corresponding to the highest S/N map for each 
candidate. 



4. Selection Function for Single Frequency SZ 
Surveys 

We consider a single frequency SZ survey with charac- 
teristics representative of upcoming interferometers (e.g., 
the Arcminute MicroKelvin Imager being constructed 
in Cambridge'^): a 15 GHz observation frequency, 2 ar- 
cmin FWHM (synthesized) beam and a noise level of 
5 ^K/beam. Note that, for simplicity and generality, we 
model the observations as a fully sampled sky map in- 
stead of actual visibilities. This approximation should be 
reasonably accurate given the good sampling expected in 
the Fourier plane; it will, however, miss important details 
of the selection function that will require adequate model- 
ing when the time comes. In the same spirit, we also model 
the noise as a white Gaussian random variable with zero 
mean and the given variance. 

During the course of the discussion, we will often com- 
pare the following observational cases: 1) no instrumen- 
tal noise (CMB-t-beam*); 2) the former plus instrumen- 
tal noise at 5 /iK/beam; and 3) the previous plus point 
sources below a flux limit of 100 /iJy at 15 GHz. In this 
last case, we are assuming that all sources brighter than 
the flux limit are explicitly subtracted; for example, both 
AMI and the SZA^ plan long baseline observations for 
point source removal. 

Integrated source counts in terms of total cluster flux 
Y (measured in arcmir?) are shown in Figure^ The theo- 
retical counts for the fiducial model are given by the solid 
black line, while the other curves give the counts from 
our simulated observations. They are plotted in terms of 
true flux Y , except for the red dashed curve that gives the 
counts as a function of observed flux 1^, as would actually 
be observed in a survey. Differences between the detected 
cluster counts and the theoretical prediction (black solid 
line) reflect catalog incompleteness; the nature of this in- 
completeness is the focus of our discussion. The influence 
of photometric errors is illustrated by the difference be- 
tween the observed counts as a function of observed flux 
(red dashed curve) and the detected-cluster counts given 
as a function of true flux. 



4.1. Catalog completeness 

It is important to understand the exact nature of the in- 
completeness evident in Figure^ and we shall now demon- 



1000.00 
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Yo (CMB+beom+noise+sources) 
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Fig. 1. Cluster counts in terms of integrated Y for the 
input concordance model (black solid line) and for de- 
tected clusters: the green dotted line gives the counts 
neglecting the effects of instrumental noise and point 
sources (CMB+bcam=2 arcmin FWHM); the blue dash- 
dotted line includes instrumental noise (5 /iK/beam); the 
red dash-triple-dotted line further includes the effects 
of residual point sources after explicit subtraction of all 
sources with flux greater than 100 /j,Jy (see text). These 
are all plotted as functions of the true total flux Y . The 
red dashed line shows the observed counts for the latter 
case in terms of the observed flux Y^. 
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10-6 
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^ http : //www.mrao . cam. ac .uk/telescopes/ami/index.htmll 

* Note that in this case of no noise, the beam can be perfectly 
deconvolved. 
^ ,http : // astro . uchicago . edu/sze] 



Fig. 2. Selection in the parameter plane of total flux Y 
and core radius 9^. The three curves correspond to the 
different simulated cases, as indicated in the legend; all 
correspond to a cut at signal-to-noise of 5. The dot- 
dashed lines in the background give contours of constant 
mass in this plane; each is parameterized by redshift z. 
Note that cluster selection does not follow a simple flux 
cut, which would be a horizontal line, nor a simple mass 
cut. Photometric errors are neglected in this plot, mean- 
ing that observed cluster parameters Yq and 9^ equal the 
true values Y and 9c. 
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strate that it is not simply a function of total flux. Our 
detection algorithm operates as a cut at fixed signal-to- 
noise, which leads to the following constraint on (true) 
cluster parameters Y and 9^- 



Y = 2/cst J dfl Tc{n) > ^-^^ ae^ J dQ 



Tc{n) 



(12) 



where j/cst is the central Compton parameter estimated by 
the filter matched to a cluster of core radius 9c, and the 
filter noise on this scale is given by Equation (|f f() . Figure|2] 
shows the resulting selection curves for our three cases in 
the Y~9c plane at S/N > 5. Note that we are speaking 
in terms of true cluster parameters, leaving the effects of 
photometric errors aside for the moment. 

It is clear from this Figure that cluster selection does 
not correspond to a simple flux cut - it depends rather 
on a combination of both source flux and angular extent. 
The exact form of this dependence is dictated by the noise 
power spectrum, which must be understood to include 
primary CMB anisotropy. That this latter dominates on 
the larger scales can be seen from the fact that the three 
curves approach each other at large core radii. For smaller 
objects, on the other hand, instrumental noise and resid- 
ual point source contamination "pull" the curve towards 
higher fluxes relative to the ideal case that includes only 
CMB anisotropics (dotted line). 

For the solid red curve, we calculate the flux variance 
induced by residual point sources at the given fllter scale 
and then add the equivalent Gaussian noise term to the 
instrumental noise and CMB contributions. One may well 
ask why the source fluctuations should be Gaussian given 
the shallow slope of the radio source counts that would 
normally lead to very non-Gaussian statistics. The fluc- 
tuations are in fact Gaussian, as we have verified with the 
simulations, essentially because the source subtraction is 
performed at higher angular resolution than the small- 
est filter scale; in effect, we have cleaned "below" the fil- 
ter confusion limit, so that the number of sources/filter 
beam is large and we approach the Gaussian limit. This 
realistically reflects what will actually be done with in- 
terferometers using long baseline observations for source 
subtraction. 

The dot-dashed lines in the background of the Figure 
represent contours of constant cluster mass M{Y,9c). 
They result from inversion of the Y{M,z) and 9c{M,z) 
relations, where we associate cluster core radius with fil- 
ter scale. Note that redshift varies along each contour, 
and that we have assumed zero scatter in the relations 
so that the inversion is one-to-one. In reality, of course, 
they contain intrinsic scatter, due to cluster physics, as 
well as observational scatter induced by photometric er- 
rors. The position of these mass contours depends on both 
cluster physics and the underlying cosmology; we may, for 
example, displace the contours by changing the gas-mass 
fraction. The selection curves, in contrast, are independent 
of cosmology and cluster physics, being based on purely 
observational quantities. 
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Fig. 3. Detection mass as a function of redshift. The blue 
long-dashed line shows the result for the case CMB-|-noise 
(blue long-dashed line in Figure Ejl. The rise at low red- 
shift is due confusion with primary CMB fluctuations that 
is more important for nearby clusters with large angular 
extent. The red dot-dashed line gives the result for a pure 
flux-limited catalog (see text), and the black short dashed 
line that for observations without CMB confusion (e.g., 
multi- frequency) . Relative to a pure flux-limited catalog, 
both observed catalogs loose clusters over a range of red- 
shifts. 

Observed clusters populate this plane according to 
the distribution dN^ / dYod9 co, which depends on cluster 
physics, cosmology and photometry; eq. JSJ gives it in 
terms of the key theoretical quantity, the mass function. If 
photometric errors are assumed to be unimportant, then 
Eq. © applies and we see that the function x(^j ^c) is 
a step function taking the value of unity above the se- 
lection curves, and zero below; photometric errors simply 
"smooth" the selection function <!> as manifest by Eq. 
Completeness expressed in terms of the function x is there- 
fore independent of cluster physics and cosmology. A more 
common way to express completeness is by the ratio of 
detected to actual clusters as a function of total flux (or 
angular scale). At a given flux, for example, this ratio is 
the fraction of clusters falling above the selection curve. 
Clearly, it depends on the distribution of clusters over the 
plane and is, hence, dependent on cluster physics, cosmol- 
ogy and photometry. We conclude that the function x is 
a more useful description of a survey. 

Figure|21provides a concise and instructive view of clus- 
ter selection over the observational plane. We are of course 
ultimately interested in the kinds of objects that can be 
detected as a function of redshift, and to this end it is use- 
ful to study the detection mass shown in Figure 01 This is 
deflned as the smallest mass cluster detectable at each red- 
shift given the detection criteria. For the figure, we assume 
that there is no scatter in the Yo{M, z) and 9co{M, z) re- 
lations so that a selection curve in the observational plane 
uniquely defines the function Mdct(z). Note that, as em- 
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phasized above, these detection mass curves depend on 
the assumed cosmology. 

We compare three situations in the figure. The blue 
long-dashed line gives the detection mass for the case 
CMB+noise (single frequency experiment), while the red 
dot-dashed line shows the result for a pure flux-limited 
catalog. The chosen flux cut corresponds to the left-most 
point on the blue long-dashed selection curve in Figure El 
(CMB+noise). Finally, the black short-dashed line gives 
the detection mass for a case with just instrumental noise 
(with the same beam as the previous cases) and no pri- 
mary CMB; this approximates the situation for a multi- 
frequency experiment which eliminates CMB confusion. 
The noise level has been adjusted such that the selection 
curve in the {Y, 0c)-plane matches the previous two cases 
on the smallest scales. With this choice, all three detection 
mass curves overlap at high z as seen in Figure 13 

We see that that the observed catalog (blue long- 
dashed curve) looses clusters (i.e., has a higher detection 
mass) over a broad range of redshifts relative to the pure 
flux-limited catalog (red dot-dashed line); the effect is 
most severe for nearby objects, whose large angular size 
submerges them in the primary CMB anisotropics, but it 
remains significant out to redshifts of order unity. This is 
also reflected in the redshift distribution of Figure El to 
be discussed below. We note in addition that even multi- 
frequency experiments loose clusters over a rather broad 
range of redshifts, as indicated by the difference between 
the lower two curves. 

Simulations are needed to evaluate the importance of 
factors not easily incorporated into the simple analytic cal- 
culation of the cluster selection curve; these include source 
blending and morphology, other filtering during data anal- 
ysis, etc... Using our simulations, we find that cluster de- 
tection in mock observations closely follows the analytic 
predictions, thus indicating that blending does not sig- 
nificantly change the above conclusions, at least for the 
case under study - a 2 arcmin beam with noise at a level 
of 5 /iK/beam - representative of planned interferometer 
arrays. As our current simulations only employ spherical 
beta model profiles, they only test for the importance of 
blending effects; future work will include more realistic 
profiles taken, for example, from hydrodynamical N-body 
simulations. The simulations are also crucial for correctly 
evaluating the photometric precision of the survey cata- 
log. Contrary to the situation for cluster detection, we find 
that blending greatly affects photometric measurements: 
photometric scatter from the simulations is significantly 
larger than expected based on the S/N ratio, whether the 
threshold is taken at S/N=5 or 3. 

4.2. Catalog contamination 

Contamination by false detections is a separate function 
that can only be given in terms of observed flux and angu- 
lar (or filter) scale; once again, simulations are crucial for 
evaluating effects such as blending and confusion. Figure^l 
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Fig. 4. Contamination rate for a single frequency sur- 
vey as a function of total flux for two different detection 
thresholds. The histograms give the percentage of sources 
that are false detections in catalogs extracted from our 
simulations. 

shows the contamination level in our extracted catalogs as 
a function of total flux Y. The level is significantly higher 
than expected from the S/N ratio, indicating that con- 
fusion and blending effects are clearly important. This is 
most obvious for the case with S/N=3, where contami- 
nation rises towards the high flux end due to confusion 
with primary CMB fluctuations that are more prevalent 
on larger angular scales. Even at relatively low flux levels 
around IQ-^ arcmin^, we see that the contamination rate 
remains near or above 10% for the S/N=3 case. This quan- 
tifies the the expectation that single frequency surveys will 
contend with a non-negligible level of contamination. 

4.3. The redshift distribution 

The example of extracting cosmological constraints from 
the redshift distribution of SZ detected clusters affords 
a good illustration of the importance of understand- 
ing the selection function. These constraints arise from 
the shape of the cluster redshift distribution, which 
is affected by such parameters as the matter den- 
sity (|Oukbir fc Blanchard 199711 and the dark energy 
equation-of-state ^Wang fc Steinhardt 1998| ); this is in 
fact one of the primary motivations for performing SZ 
cluster surveys fH aiman et al. 2001|l . The important point 
is that the redshift distribution expected in a given cos- 
mological model also depends on the catalog selection 
function. In the following discussion, we assume that the 
Y{M, z) and 9c{M, z) relations are perfectly known. 

Consider the redshift distributions shown in Figure El 
for an observation where residual point source contam- 
ination has been reduced to a negligible level (case 2). 
The black line represents the theoretical distribution for 
clusters with total flux Y > 5 x 10~^ arcmin^, which cor- 
responds to the point source detection limit on the small- 
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Fig. 5. Redshift distribution of SZ clusters (case 2 - with- 
out residual point source noise). The black solid and red 
dashed curves give the theoretically predicted counts at 
the two indicated flux limits. Corresponding distributions 
for the simulated recovered counts, with the same two flux 
cuts on the true Y, are shown by the black and red, dashed 
histograms; the small difference between the two reflects 
the flat observed counts in Figure H The lighter, green 
histogram shows the simulated counts cut at an observed 
flux of 1^ > 10~* arcmin^. 



est filter scale (leftmost point on the dashed blue curve 
in Figure!^. This predicted distribution is very different 
from the actual distribution of clusters shown as the black 
histogram. It is clearly impossible to deduce the correct 
cosmological parameters by fitting a flux-limited theoret- 
ical curve to the observed distribution. This demonstrates 
that the point-source flux limit cannot be used to model 
the catalog redshift distribution, which is already clear 
from the fact that the counts in Figure H have already 
turned over and the catalog is clearly incomplete. 

One can try to cut the catalog at a higher flux limit 
of y > 10^^ arcmin^, where the observed counts just be- 
gin to flatten out and incompleteness is not yet severe. 
Comparison of the dashed red line - theoretically pre- 
dicted counts at this flux limit - with the red dashed his- 
togram shows that the observed distribution still differs 
significantly from the predicted flux-limited redshift dis- 
tribution. Modeling the observed catalog as a pure fiux cut 
would again lead to incorrect cosmological constraints. In 
order to extract unbiased parameter estimates, one must 
adequately incorporate the full catalog selection criteria. 

We may illustrate this point by considering the ef- 
fect of an un-modeled CMB power excess at high I, such 
as suggested by the CBI experiment IjMason et al. 2nni]l . 
As we have seen in Figure |21 the primary CMB fluctua- 
tions influence the exact form of the selection curve in the 
(Y, 6c) plane; their power on cluster scales must there- 
fore be accurately known to correctly model the clus- 
ter selection function. The black curve and black his- 



togram in Figure repeat the results of Figure |31 for a 
cut at y > 5 X 10~^ arcmin^. In particular, the black 
histogram gives the redshift distribution of clusters ex- 
tracted from simulations including a CMB power spec- 
trum corresponding to the concordance model. The blue 
(lower) histogram shows the redshift distribution for clus- 
ters extracted from simulations in which additional CMB 
power has been added at high I - a constant power of 
l{l + 1)Ci/2tt = 20 /iK was smoothly joined to the con- 
cordance model CMB spectrum (just below I = 2000) and 
continuing out to I = 3000. Instead of plunging towards 
zero, as expected of the primary CMB fluctuations in the 
concordance model, this second model levels off at a con- 
stant power level on cluster scales. This has an important 
effect on cluster detection, as clearly evinced in the Figure. 

We now examine the effect of ignoring this excess 
power in an analysis aimed at constraining cosmological 
parameters. This means that we ignore the excess both in 
the construction of the matched filter and in the selection 
function model needed for the fit. The former has only a 
relatively minor effect on the catalog extraction and ob- 
served histogram. The second effect is much more serious, 
as we now demonstrate. 

Consider constraints on the parameter pair (f2M)^^A) 
by fitting models to the redshift distribution of a 3x3 
square degree survey. Note that the histograms shown in 
the figures are in fact averages taken over an ensemble of 
50 such simulations, to avoid confusing statistical fluctua- 
tions. For the present example, however, we fit models to 
the redshift distribution from a single simulation. During 
the fit, we fix the Hubble parameter to its standard value 
{Ho = 70 km/s/Mpc) and adjust the power spectrum nor- 
malization CTg to maintain the observed present-day clus- 
ter abundance (following |Pierpaoli et al. 2001| ). For our 
simplified case of zero-scatter relations between {Yo,Oco) 
and (M,z), both the selection function $ and the intrin- 
sic scatter function T contain Dirac delta functions that 
collapse the various integrals in Eqs.|2|and^ We then ob- 
tain the following expression for the redshift distribution 
of observed clusters brighter than a flux of 1^: 



dz 



{>Yo) 



dM x[Y{M,z),0c{M,z)] 



M{Ya. 



dN 
dzdM 



(13) 



where M{Y, z) is the zero-scatter relation between flux 
and mass and redshift. All selection effects are encapsu- 
lated in the completeness function whose dependence 
on the primary CMB power is the focus of our present 
discussion. 

We consider two cases: the flrst with the expected 
concordance primary CMB power spectrum, the second 
with the CBI-like excess power. In the first case, we 
adopt the true power spectrum for catalog construction 
and modeling of x - the selection function is properly 
modeled. In the second situation, we ignore the excess 
in both catalog construction and in fitting - the selec- 
tion function is incorreclty modeled. When correctly mod- 
eling the selection function, we find best-fit values of 
{nM,^A) = (0.325,0.675). The light black dot-dashed 
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Fig. 6. Effect of incorrect modeling of the selection func- 
tion. The black continuous curve and black (upper) his- 
togram repeat the results of Figure El for catalogs cut at a 
flux of F = 5 X 10~^ arcmin^ - the former for a pure flux- 
limited catalog, the latter for the clusters extracted from 
our concordance model simulations with the expected pri- 
mary CMB power spectrum [(f^Mj^^A) — (0.3,0.7)]; note 
that the histogram is calculated as the average over 50 
simulations of a 3x3 square degree survey field. The light 
black, dot-dashed curve is the best-fit model to the red- 
shift distribution from a single such simulation; the con- 
straints from for this fit are shown in Fig. [3 The lower 
(blue) histogram shows the distribution of clusters ex- 
tracted from the same 50 simulations, but with excess pri- 
mary CMB power added at high I (see text); once again, 
the histrogram is the average over the ensemble of simu- 
lations. The blue dashed curve shows the best-fit for the 
same realization as before - but now with the excess - 
when ignoring the excess in the fitting (incorrect selection 
function modeling) . Corresponding constraints are shown 
in Fig. [71 Both fits are statistically acceptable (see text). 



curve in Figure shows that this model reasonably re- 
produces the predicted redshift distribution (black solid 
histogram), and the la contours in Figure enclose the 
true (simulation input) values. The fit is good with a re- 
duced — 0.94 (34 degrees-of-frccdom) . When incor- 
rectly modeling the selection function, on the other hand, 
we find biased best-fit values of (0.4, 0.375), and, as shown 
in Figured the true parameter values fall outside the 99% 
confidence contours. Furthermore, this biased fit is accept- 
able with a reduced — l-l''' (31 degrees-of-freedom) , 
giving no indication of its incorrectness. The redshift dis- 
tribution of this model is shown as the light dashed (blue) 
curve in Fig. El faithfully reproducing the (averaged) his- 
togram for this case. This is a particularly telling example 
of the importance of the selection function, because the 
primary CMB power on cluster scales is at present not 
well known. It will have to be constrained by the same 
experiments performing SZ cluster surveys; cosmological 



Fig. 7. Confidence contours for the fits discussed in 
Figureini shown for a survey covering 3 x 3 sq. degrees. The 
upper (black) contours correspond to the case where the 
selection function is correctly modeled (no excess CMB 
power at high I); the best-fit parameters are (I^Mj^^a) — 
(0.325, 0.675) and la contours fully enclose the true (sim- 
ulation input) cosmological values of (0.3, 0.7). The larger 
(blue) contours represent the situation when the CMB ex- 
cess is not properly accounted for by the selection function 
model. The best-fit parameter values are significantly bi- 
ased - (0.4, 0.375) - and the true parameter values, lie 
outside the 99% contour. In both cases the fits are accept- 
able (see text). 

constraints will be correspondingly degraded, a subject we 
return to in a future work. 

For another example of incorrect modeling of the selec- 
tion function, consider that /3 and 6c of real clusters may 
not behave as we assume when constructing the matched 
filter. This will bias flux measurments and displace the 
selection curve in the {Y,6c) plane relative to our expec- 
tations, leading to an incorrect selection function model. 
As above, this will yield biased parameter estimates. 

As a final note, and returning to Figure|31 we show the 
distribution of detected clusters at the higher flux cut as a 
function of o&serwedflux with the lighter, green histogram. 
The difference with respect to the corresponding distribu- 
tion in terms of true flux (the red, dashed histogram) re- 
flects statistical photometric errors; note that in fact this 
tends to falsely increase the number of objects seen at the 
higher redshifts. Although in this case photometric errors 
are of secondary importance to the observed redshift dis- 
tribution (completeness effects dominate), they must also 
be fully accounted for in any cosmological analysis. 

5. Discussion and Conclusions 

Our aim as been to emphasize the importance of under- 
standing the SZ cluster selection function, as for any as- 
tronomical survey. We proposed a general dcflnition of the 
selection function that can be used to directly relate the- 
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oretical cluster distributions to observed ones, and which 
has the nice property of clearly separating the influence 
of catalog incompleteness and photometric errors. It is a 
function of both observing conditions and of the detection 
and photometry algorithms used to construct the survey 
catalog. Defined over the (true) total flux-angular size 
plane, however, the selection function is independent of 
cosmology and cluster physics; its connection to theoreti- 
cal cluster descriptors, such as mass and redshift, on the 
other hand, depends on both. A common way of quoting 
incompleteness in terms of total flux is similarly sensitive 
to cluster physics and underlying cosmology. 

Using a matched spatial fllter (Melin et al. 2004), we 
studied the selection function for single frequency SZ sur- 
veys, such as will be performed with upcoming interfer- 
ometers^. Our main result is that a SZ catalog is not sim- 
ply flux limited, and this has implications for cosmologi- 
cal studies. A simple analytic argument shows the exact 
manner in which catalog selection depends on both cluster 
flux and angular size; simulated observations indicate that 
this simple estimate is quite accurate and little affected by 
blending, although future work needs to take into account 
more realistic cluster profiles. We also noted that noise 
induced by residual point sources tends to be Gaussian, 
because subtraction of the brightest sources will be done 
at higher angular resolution than the smallest filter scale 
in the SZ maps. 

The implications for cosmological studies were illus- 
trated with the redshift distribution, which will serve 
to constrain cosmological parameters in future surveys. 
Theoretical redshift distributions based on a simple flux 
limit cannot flt observed distributions; at best they would 
lead to biased estimates of cosmological parameters. One 
must incorporate the complete selection criteria depend- 
ing on both flux and angular extent, and hence have a 
good understanding of the catalog selection function. This 
understanding depends on a number of astrophysical fac- 
tors in addition to instrumental parameters. Our example 
of an unmodeled primary CMB power excess (relative to 
the adopted concordance model) on small angular scales 
(/ > 2000) highlights the point: we obtained biased pa- 
rameter estimates because the selection function was in- 
correctly modeled; note that the false flt was in fact a 
good fit to the data, according to the x^- Other factors, 
for example, cluster morphology and its potential evolu- 
tion, will also play a role. In the particular case of the 
CMB power excess, we note that accurate knowledge of 
the primary CMB power on cluster scales will come from 
the same experiments performing the cluster surveys. It 
will be necessary to constrain the primary CMB power at 
the same time as cluster extraction, a point we return to 
in a future work. 

An issue currently receiving attention in the literature 
concerns SZ survey "calibration" , by which is meant the 
empirical Establishment of the Y{M, z) relation. This is 

Although we have not here modeled the actual data taking 
in the visibility plane. 



clearly essential for any cosmological study. The fact that 
SZ catalog selection depends not only on total fiux but also 
on angular size complicates the question of survey calibra- 
tion, for it implies that one must additionally establish a 
9c{M, z) relation, or its equivalent with some other angu- 
lar size measure. In fact, since the dispersion on Y and 
9c will in general be correlated, we need the full joint dis- 
tribution for these observables as a function of mass and 
redshift. Photometric errors, which we find can be signifi- 
cant, further complicate the issue by increasing scatter in 
observed relations and hence making them more difficult 
to obtain. 

Although in this work we have focused our detailed 
study on single frequency surveys, the general conclusions 
should carry over to multiple frequency observations. In 
closing we note that the selection function obviously has 
equally important implications for other studies based on 
SZ-detected cluster catalogs, such as spatial clustering, 
etc... For many of these studies, photometric errors, which 
we have only briefiy touched on here, will take on even 
greater importance. 
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